Spark Summit 2017

Spark summit 2017

感觉这次比 年初的east 居然要寒碜? 居然早上连个早餐的都没有. 给的袋子里面也全部是广告.

KeyNote

A Deep Dive into Spark SQL’s Catalyst Optimizer

Have already seen this talk during spark summit east. But still interesting to know we can define query plan ourselves.

  • Todo: add code from slides later.

Lazy Join Optimizations Without Upfront Statistics

Project from UCLA, This talk is really interesting. It is mainly trying to solve two problems. 1. Unlike traditional RDMBS, Generate statics for query optimization is hard. 2. Correct where clause sequence in join so that we could reduce the shuffle.

The Top Five Mistakes Made When Writing Streaming Applications

An useful talk. Talk about many good practice about writing spark streaming. What is the good and bad situtation for running spark streaming?

  • Need wait summarize until slides published

Building Data Product Based on Apache Spark at Airbnb

  1. Unify batch processing and streaming processing. Seems that this is really popular these days.
  2. Shared storge

Question:

  • Is it really a good idea to use spark streaming to calculate long window calculation?

Experiences Migrating Hive Workload to SparkSQL

Facebook is trying to migrate from Hive to Spark SQL to gain better performance. So that the user can wrote either HiveSQL or SparkSQL. This underlying engine will run and translate, estimate, or compare HiveQL to Spark SQL. I think this is too specific to facebook. They wouldn’t release how they do it and what performance gain they got. But I’m still couldn’ imagine how big the data running in those pipeline at facebook.

Debugging Big Data Analytics in Apache Spark with BigDebug

Another research project from UCLA. This enhance apache spark with debug features such as set break point, watch point. And able to write function to change error input to midiate crash. They try to finish this to Spark 2.1 and currently they don’t spark SQL support.

I kind like this idea. But it is hard to use. Assume we’re running a complex pipeline. This stop and check strategy is kind

Productive Use of the Apache Spark Prompt

Another talk about debug spark. The speaker from Mozilla shared some his thinking on debugging pattern.

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust

Another spark from Jupyter Notebook.

Share Comments